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Abstract — In this paper we introduce the first application of 
the Belief Propagation (BP) algorithm in the design of recom- 
mender systems. We formulate the recommendation problem 
as an inference problem and aim to compute the marginal 
probability distributions of the variables which represent the 
ratings to be predicted. However, computing these marginal 
probability functions is computationally prohibitive for large- 
scale systems. Therefore, we utilize the BP algorithm to efficiently 
compute these functions. Recommendations for each active user 
are then iteratively computed by probabilistic message passing. 
As opposed to the previous recommender algorithms, BPRS does 
not require solving the recommendation problem for all the users 
if it wishes to update the recommendations for only a single 
active. Further, BPRS computes the recommendations for each 
user with linear complexity and without requiring a training 
period. Via computer simulations (using the 100K MovieLens 
dataset), we verify that BPRS iteratively reduces the error in 
the predicted ratings of the users until it converges. Finally, we 
confirm that BPRS is comparable to the state of art methods 
such as Correlation-based neighborhood model (CorNgbr) and 
Singular Value Decomposition (SVD) in terms of rating and 
precision accuracy. Therefore, we believe that the BP-based 
recommendation algorithm is a new promising approach which 
offers a significant advantage on scalability while providing 
competitive accuracy for the recommender systems. 

I. Introduction 

Today, the quantity of available information grows rapidly, 
overwhelming consumers to discover useful information and 
filter out the irrelevant items. Thus, the user is confronted 
with a big challenge of finding the most relevant information 
or item in the short amount of time. Recommender systems 
are aimed at addressing this overload problem, suggesting to 
the users those items that meet their interests and preferences. 
More generally, recommender systems can learn about user 
preferences and profile over time, based on data mining 
algorithms, and automatically suggest products (from a large 
space of possible options) that fit the users' needs. Hence, it is 
foreseeable that the social web is going to be driven by these 
recommender systems. 

However, there are certain challenges to design scalable, 
accurate and dependable recommender systems. The available 
data for the recommender systems is incomplete, uncertain, 
inconsistent and/or intentionally-contaminated. Further, since 
new data (ratings) becomes available continuously, recom- 
mendations need to be updated in frequent intervals causing 
computational limitations for large-scale systems. Latent factor 
models (such as Matrix Factorization) have proven to be the 



most accurate method in the Root Mean Square Error (RMSE) 
sense. However, most existing and highly popular Matrix 
Factorization-based recommender algorithms are shown to be 
prone to malicious behavior 0] and they have scalability 
issues. In other words, they fall short of incorporating the 
attack profiles and the extra noise generated by the malicious 
users. Further, each new update (using the most recent data or 
ratings) for a particular active user requires to solve the entire 
problem for every user in the system. Hence, new research 
needed to focus on algorithms which meet these challenges 
and provide scalable, accurate and dependable recommender 
systems. 

In this paper we introduce the first application of Belief 
Propagation (BP), an iterative probabilistic algorithm, to solve 
the recommendation problem. We have applied BP to trust 
and reputation systems in our previous work Q, (5). In such 
systems, BP is used to solve the inference problem for finding 
the global reputation of service providers in a network based 
on the previous ratings of the users. The main difference be- 
tween trust and reputation systems and recommender systems 
is that in the former one the inference problem has to be solved 
globally but in the latter one, the inferences are local and 
specific for each user. In and 0, we have studeied the 
reputation system for Delay tolerant networks (DTN) and P2P 
networks respectively. 

The key observation we make is that recommender systems 
deal with complicated global functions of many variables (e.g., 
users and items). By using a factor graph, we can obtain 
a qualitative representation of how the users and items are 
related on a graphical structure. Therefore, we propose to 
model the recommender system on a factor graph using which 
our goal is to compute the marginal probability distribution 
functions of the variables representing the ratings to be pre- 
dicted for the users. However, we observe that computing the 
marginal probability functions is computationally prohibitive 
for large-scale recommender systems. Therefore, we utilize the 
BP algorithm to efficiently compute these marginal probability 
distributions. The key role of the BP algorithm is that we can 
use it to compute the marginal distributions in a complexity 
that grows linearly with the number of nodes (i.e, users/items). 

Hereafter, we refer to our scheme as the "Belief Propaga- 
tion Based Iterative Recommender System" (BPRS). BPRS 
has several prominent features. First, it does not require to 
solve the problem for all users if it wishes to update the 



predictions for only a single active user and it does not require 
a training period to utilize the most recent data (ratings). 
Second, its complexity remains linear per single user, making 
it very attractive for large-scale systems. Therefore, it can 
update the recommendations for each active (online) user 
instantaneously using the most recent data (ratings). Further, 
we show that BPRS provides comparable usage prediction 
and rating prediction accuracy to other popular methods such 
as the Correlation-based neighborhood model (CorNgbr) and 
Singular Value Decomposition (SVD). Therefore, we are very 
optimistic that this work promises a new direction for the 
recommender systems which will be scalable, accurate, and 
resilient to attacks. 

The rest of this paper is organized as follows. In the rest of 
this section, we summarize the related work. In Section [TT1 we 
describe the proposed BPRS in detail. Next, in Section|lIIJ we 
evaluate BPRS via computer simulations using the MovieLens 
dataset. Finally, Section [TVl concludes the paper. 

A. Related Work 

Recommender systems (6J can be classified into two main 
categories: i) content-based filtering [7] in which the system 
uses behavioral data about a user to recommend items similar 
to those previously consumed by the user, and ii) collabo- 
rative filtering [8] in which the system compares one user's 
behavior against the other users' behaviors and identifies items 
which were preferred by similar users. Collaborative filtering 
algorithms fall further into two general classes: memory- 
based [9 1 and model -based algorithms (TOl, ifTTTl . Model- 
based algorithms include methods exploiting Singular Value 
Decomposition (SVD), Principal Component Analysis (PCA) 
and Maximum Margin Matrix Factorization (MMMF) tech- 
niques Ifl2l. Ifl3l. 

The application of Bayesian networks and message passing 
algorithms for recommender systems is also studied in the 
past fl4l . fl5l . In fl4l . the message passing technique is used 
to determine the latent factors of the users and items (as an al- 
ternative to SVD). In 1 15 1, because of the fuzziness associated 
with the ambiguity in the description of the ratings, a (non- 
iterative) inference is proposed among the users to remove 
this ambiguity. The key difference between our approach and 
the other message passing-based methods is that, we describe 
the recommendation problem as computing marginal likeli- 
hood distributions from complicated global functions of many 
variables and to use Belief Propagation (BP) to find them. 
This is inspired by successful applications of BP algorithms in 
various fields such as decoding of error correcting codes fl6l . 
Artificial Intelligence ifTTl . and reputation systems Q. 

II. Belief Propagation for 
Recommender Systems 

Belief Propagation (BP) |[T6l . IfTTl is a message passing 
algorithm for performing interface on graphical models (e.g., 
factor graphs, Bayesian networks, Markov random fields). It 
has demonstrated empirical success in numerous applications 



including LDPC codes, turbo codes, free energy approxima- 
tion, and satisfiability. BP is a method for computing marginal 
distributions of the unobserved nodes conditioned on the 
observed ones. 

Our objective is to formulate the recommendation problem 
as making statistical inference about the ratings of users for 
unseen items based on observations. That is, given the past 
data evidence, what would be the likelihood (probability) that 
the rating takes a particular value? Here, the probability is 
the degree of belief to which the prediction of the rating is 
supported by the available evidence. This requires finding the 
marginal probability distributions of the variables representing 
the ratings of the items to be predicted conditioned on some 
observed preferences. 

We assume two different sets in the system: i) the set of 
users U and ii) the set of items (products) I. Users provide 
feedbacks, in the form of ratings, about the items for which 
they have an opinion. The main goal is to provide accurate 
recommendations for every user by predicting the ratings of 
the user for the items that he/she has not rated before (unseen 
item). Here, we consider an arbitrary user z (referred as the 
active user) and compute the prediction of ratings for user z 
for unseen items. We assume u users and s items in the system 
(i.e., |U| = u and |I| = s). Let G 2 = {G ZJ : j € 1} be the 
collection of variables representing the ratings of the items to 
be predicted for the active user z. Note that a subset of these 
variables are already known as the corresponding items were 
rated by user z. Hence, they do not require any prediction. Let 
also R 2 = {R Z i : i G U} be the confidence of the system on 
the users for their ratings' reliability, given the active user is 
z. Further, we let Tij represent the rating provided previously 
by user i about the item j. We denote T as the sxu item-user 
matrix that stores these ratings, and Tj as the set of ratings 
provided by the user i. We note that some rating entries could 
be missing (attributed to unseen items). To be consistent with 
the most of existing recommender systems, we assume that 
the rating values are integers from the set T = {1, 2, 3, 4, 5}. 

The recommendation problem can be viewed as finding 
the marginal probability distributions of each variable in 
G z , given the observed data (i.e., existing ratings and the 
confidence of the system for the user's ratings). There are s 
marginal probability functions, p(G ZJ ;|T, R 2 ), each of which 
is associated with a variable G z f, the predicted rating of item 
j for user z. We formulate the problem by considering the 
global function p(G- 2 |T,R z ), which is the joint probability 
distribution function of the variables in G 2 given the rating 
matrix and the confidence of the system for the user's ratings. 
Then, clearly, each marginal probability function p(G Z j |T, R z ) 
may be obtained as follows: 

p(G zj \T,R z ) = J2 KG»|T,R,), (1) 

where the notation G z \{G Z j} implies all variables in G z 
except G Z j. 



Unfortunately, the number of terms in ([]]) grows expo- 
nentially with the number of variables, making the direct 
computation infeasible for large-scale systems. However, we 
propose to factorize (Q} to local functions /; using a factor 
graph and utilize the BP algorithm to calculate the marginal 
probability distributions in linear complexity. A factor graph is 
a bipartite graph containing two sets of nodes (corresponding 
to variables and factors) and edges incident between two sets. 
Following [16 1, we form a factor graph by setting a variable 
node for each variable G Z j, a factor node for each function 
fi, and an edge connecting variable node j to the factor node 
i if and only if G Z j is an argument of /j. 

We arrange the collection of the users and items together 
with the ratings provided by the users as a factor graph g(V, I). 
Then, since we consider the particular active user z, the factor 
graph is reduced to g(V, I) (as in Fig. [TJ| by only keeping the 
users that are connected to z via a path of length at most two 
in .g(U, I) (i.e., the users who rated at least one item that is also 
rated by z) and removing all the other user nodes from the 
graph together with their edges. In this representation, each 
user corresponds to a factor node in the graph, shown as a 
square and each item is represented by a variable node shown 
as a hexagon. Further, each rating is represented by an edge 
from the factor node to the variable node. Hence, if a user i 
(i £ U) has a rating about item j (j £ §), we place an edge 
with value from the factor node i to the variable node 
representing item j. Eventually, the g(t], I) graph has U| — u 
users and 111 = s items. 
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Fig. I. Graphical representation of the scheme from user z's point of view. 

Next, we suppose that the global function p(G z |T,R z ) 
factors into products of several local functions, each having 
a subset of variables from G z as arguments as follows: 



p{G z \T,R z ) = ^Y[fi(g z i,Ti,R zi ), 



(2) 



where Z is the normalization constant and Q Z i is a subset of 
G z . Hence, in the graph representation of Fig. [Tj each factor 
node is associated with a local function and each local function 
fi represents the probability distributions of its arguments 
given the confidence of the system for the associated user and 
the existing ratings of the associated user. 

We now describe the message exchange between a user k 
and an item a (in Fig. [TJ provided that the active user is z 



in BPRS. We clarify that all the messages are formed by the 
algorithm that is ran in the central authority. We represent 
the set of neighbors of the variable node a and the factor 
nodes k and z (in <?(1J, I)) as N a , Nk, and N z , respectively 
(neighbors of an item are the set of users who rated the 
item while neighbors of a user are the items which it rated). 
Further, let 3 = N a \{k} and A = N k \{a}. Let G^) and 

b e me vame of variable G Z j and system's confidence 
on user i at the iteration v of the algorithm, respectively. The 
message \£\ a (Gza ) (from factor node k to the variable node 
a) denotes the relative probabilities that G Z J = I (I £ T) 
at the v th iteration, given T ka and R^ k X \ On the other 
hand, fJ^\ k (Gz a }) (from variable node a to the factor node 
k) denotes the probability that G ( z } = i (£ £ T) at the v th 
iteration. 

The message from the factor node k to the variable node a 
at the v th iteration is formed using the principles of the BP 

as 
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(3) 



where Q z k is the set of variable nodes which are the arguments 
of the local function //. at the factor node k. This message 
transfer is illustrated in the right half of Fig. [2] Further, R^ k ^ 
is a value between zero and one and can be calculated as 
follows: 



R 



1 



1 



p|N k 



(4) 



The above equation can be interpreted as one minus the 
average inconsistency of user k calculated by using the mes- 
sages it received from all its neighbors. Further, p, which is 
the highest possible deviation of a user, is set to 4 in this 
particular rating system, where the rating values are integers 
from the set T. Thus, the reliability of users (in their ratings) 
is measured based on the messages formed by the algorithm. 
Using (TJ) and assuming that the predicted ratings in set Q zk 
are independent from each other at each intermediate step (to 
reduce the computational complexity), it can be shown that 



A«&\T fc , = n a(<#\t», as- 11 ). (5) 

i£N k 

Thus, the message in © becomes 

A£1(G^) = MG%,T k ,R^)x 

{ E [ II MG^^ k ,R^)ll^(G%)\}. 

(6) 



Since the second part of (0 is a constant, 

>#\a(G®) <* fk(GM,T k ,B£-V), and hence, 
KlaiGtt) oc p{G^\T ka ,R^), where 



p{G^=l\T ka ,R^) 
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(7) 



Here, n a denotes the genre (i.e., type) or the set of genres of 
item a. Further, \n a (h)\ is the number of items in the same 
genre as n a which are previously rated as h by the active user 
z. The way we compute the probabilities in (0 resembles 
the belief/plausibility concept of the Dempster-Shafer The- 



ory Ha. Given T ka = 1, R 



(u-l 
zk 



can be viewed as the belief 

th 



of user k that G za is one (at the v iteration). In other words, 
in the eyes of user k, G^J is equal to one with probability 
R^" k 1 \ Thus, (1 — R^l 1 ') corresponds to the uncertainty in 
the belief of user k. In order to remove this uncertainty and 
express p(G z V a \T ka , Rzk ) as me probabilities that Gza is 
I (£ 6 T), we distribute the uncertainty among the possible 
outcomes (one to five) in proportion to the histogram of the 
ratings provided by the active user z for the items in the same 
genre as n a . That is, if the active user previously provided 
high ratings for the items in the same genre as n a , then 
we distribute most of the uncertainty to the higher ratings 
in proportion to the rating histogram of the active user for 
the items in the same genre as n a - Similarly, if the active 
user previously provided low ratings for the items in the same 
genre as n a , we distribute most of the uncertainty to the lower 
ratings. Therefore, from user fc's point of view, G za is equal 
to one with probability R%~ 1] + {l~R [ zk ~ 1] ) x ^fijwi+i] • 

her 

On the other hand, it is equal to £ (£ ^ 1) with probability 



We note that the above discus- 
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sion assumed T ka = 1 and similar statements hold for the 
cases when T ka = 2,3,4,5. It is worth clarifying that, as 
opposed to the Dempster-Shafer Theory, we do not combine 
the beliefs of the users. Instead, we consider the belief of 
each user individually and calculate probabilities that Gza 
being £ (t G T) in the eyes of each user as in (£7). We 
note that if the active user z did not rate any items from 
this particular genre (n a ), we distribute the uncertainty in 
proportion to the average rating of user z (for the items it 



previously rated) (A z = — * )■ The above computation in 
(0 must be performed for every neighbors of each factor node. 
This finishes the first half of the v th iteration. In the second 
half of the v th iteration, we calculate the message (Jf\u(Gia ) 
by multiplying all probabilities the variable node a received 
from its neighbors excluding the one from the factor node k, 
as shown in the left half of Fig. [2] We note that the previous 
ratings of the active user play a key role in the algorithm. 
Hence, the values of those variables in G z which are associated 
with the items already rated by the active user z are set to the 
corresponding ratings (i.e., G Z j — T Z j if j G N z ). Thus, if 



N a \{k} ^ 




> N k \{a} 



Fig. 2. Message exchange between the factor node k and variable node a. 

a E N z , the messages generated from the variable node a 
do not vary with iterations since the value of this variable 
node (G za ) is fixed based on the ratings of the active user. 
Therefore, the message from the variable node a to the factor 
node k at the v th iteration is given by 



v . n x . (/0 x J A&, {<&) if«^N z 

her i E 5 Jfc " 



if a e N z and T za = £ 
if a e N z and T za ^ I 
(8) 



The algorithm proceeds to the next iteration in the same 
way as the v th iteration. We clarify that the iterative algorithm 
starts by computing X k 1 \ a by using R^ — g, where g 
(0 < g < 1) is the system's present confidence on the users 
for the reliability of their ratings computed at the previous 
execution of the algorithm. At the end of each iteration, the 
upper equation in (H), after following modification, is used to 
compute the prediction of ratings of the active user z. That is, 
we use the set N a instead of S in (O to compute fia (G z U a ) 
for every item a for which the active user z did not have any 
rating. Then, we set Gza = J2i=i *Ma (*')• T ne iterations stop 
when G Z j values converge for every item j. 

III. Evaluation of BPRS 

We evaluate the performance of BPRS using the 100K 
MovieLens dataset. The dataset contains 100, 000 ratings from 
943 users on 1682 items (movies) in which each user has rated 
at least 20 items. Further, the rating values are integers from 1 
to 5. We note that based on our simulations, we observed that 
BPRS converges, on the average, in 10 iterations. Therefore, 
for the remaining of this section, we either show our results 
during the first 10 iterations or after the 10 th iteration. 
A. Prediction Accuracy 

We evaluate the rating prediction accuracy of BPRS in terms 
of Root Mean Square Error (RMSE) metrics over the predicted 
ratings. We note that each test dataset is created by 80%/20% 
split of the full data into training and test data.Then, we used 
the training data (80% of the whole dataset) to predict the 
ratings in the test dataset. We computed the RMSE as below: 



RMSE 




(9) 
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Fig. 3. Performance of BPRS in RMSE vs. number of iterations when: (i) 
all users and (ii) only the 2-hop neighbors are used. 

where \K\ is the number of ratings (to be predicted) in the test 
dataset, Gij is the actual value of the rating provided by user 
i for the item j in the test dataset, and Gij is the predicted 
rating value by the algorithm. 

In Figs. [3] we show the RMSE provided by BPRS for two 
different scenarios: when all users connected to each active 
user via a path are used and when only the 2-hop neighbors 
of each active user are used in the algorithm. We observe that 
keeping only the 2-hop neighbors of each active user provides 
better performance in terms of RMSE. 

Finally, we evaluated BPRS against some popular recom- 
mendation algorithms such as: 1. MovieAvg (which computes 
the predicting ratings for the movies by averaging all the 
received ratings for each movie) with an RMSE of 1.053, 
2. Correlation-based neighborhood model (CorNgbr), with an 
RMSE of 0.9406 EE], and 3. SVD latent factor model, with 50 
factors and RMSE of 0.9046 |Qj]- We conclude that BPRS is 
comparable to existing methods such as CorNgbr and SVD 
in terms of rating prediction accuracy. On the other hand, 
BPRS generates recommendations in linear complexity for 
each active user and updates the recommendations for each 
active user instantaneously using the most recent data. 

B. Computational Complexity 

Assuming u users and s items in the system, we obtained 
the computational complexity of BPRS (in the number of 
multiplications) as max(0(cs), O(cu)) per each active user, 
where c is the average number of nonzero elements in each 
row of the user-item matrix. We note that due to the sparseness 
of the user-item matrix, the coefficient c is a small number. 
Further, as we discussed before, BPRS converges, on the 
average, in 10 iterations. Hence, we did not include the number 
of iterations in the complexity measure as it only introduces 
a small constant in front of the total complexity. This result 
indicates that BPRS can compute the recommendations for 
each active user very efficiently using the most recent data 
(ratings). Therefore, we claim that the BP-based approach 
toward the recommendation problem is very promising and 
can result in a new class of accurate and scalable recommender 
systems. 



IV. Conclusion 

In this paper, we introduced the Belief Propagation Based 
Iterative Recommender System (BPRS). BPRS formulates the 
recommendation problem as making statistical inference about 
the ratings of users for unseen items based on observations. 
BPRS provides a complexity that remains linear per single 
active user, making it very attractive for large-scale systems. 
Further, it can update the recommendations for each active 
user instantaneously using the most recent data (ratings) and 
without solving the recommendation problem for all users. 
While providing these significant scalability advantages over 
the existing methods, we showed that BPRS also provides 
comparable rating prediction accuracy with popular methods. 
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